Model-based Adversarial Imitation Learning
نویسندگان
چکیده
Generative adversarial learning is a popular new approach to training generative models which has been proven successful for other related problems as well. The general idea is to maintain an oracle D that discriminates between the expert’s data distribution and that of the generative model G. The generative model is trained to capture the expert’s distribution by maximizing the probability of D misclassifying the data it generates. Overall, the system is differentiable end-toend and is trained using basic backpropagation. This type of learning was successfully applied to the problem of policy imitation in a model-free setup. However, a model-free approach does not allow the system to be differentiable, which requires the use of high-variance gradient estimations. In this paper we introduce the Model based Adversarial Imitation Learning (MAIL) algorithm. A model-based approach for the problem of adversarial imitation learning. We show how to use a forward model to make the system fully differentiable, which enables us to train policies using the (stochastic) gradient of D. Moreover, our approach requires relatively few environment interactions, and fewer hyper-parameters to tune. We test our method on the MuJoCo physics simulator and report initial results that surpass the current state-of-the-art.
منابع مشابه
Generative Adversarial Imitation Learning
Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert’s cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a...
متن کاملEnd-to-End Differentiable Adversarial Imitation Learning
Generative Adversarial Networks (GANs) have been successfully applied to the problem of policy imitation in a model-free setup. However, the computation graph of GANs, that include a stochastic policy as the generative model, is no longer differentiable end-to-end, which requires the use of high-variance gradient estimation. In this paper, we introduce the Modelbased Generative Adversarial Imit...
متن کاملEnergy-Based Sequence GANs for Recommendation and Their Connection to Imitation Learning
Recommender systems aim to find an accurate and efficient mapping from historic data of user-preferred items to a new item that is to be liked by a user. Towards this goal, energy-based sequence generative adversarial nets (EB-SeqGANs) are adopted for recommendation by learning a generative model for the time series of user-preferred items. By recasting the energy function as the feature functi...
متن کاملLearning a Visual State Representation for Generative Adversarial Imitation Learning
Imitation learning is a branch of reinforcement learning that aims to train an agent to imitate an expert’s behaviour, with no explicit reward signal or knowledge of the world. Generative Adversarial Imitation Learning (GAIL) is a recent model that performs this very well, in a data-efficient manner. However, it has only been used with low-level, low-dimensional state information, with few resu...
متن کاملSocially-compliant Navigation through Raw Depth Inputs with Generative Adversarial Imitation Learning
We present an approach for mobile robots to learn to navigate in pedestrian-rich environments via raw depth inputs, in a social-compliant manner. To achieve this, we adopt a generative adversarial imitation learning (GAIL) strategy for motion planning, which improves upon a supervised policy model pre-trained via behavior cloning. Our approach overcomes the disadvantages of previous methods, as...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1612.02179 شماره
صفحات -
تاریخ انتشار 2016